Supervised detection of regulatory motifs in DNA sequences.
نویسندگان
چکیده
Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from structural characteristics of protein-DNA interactions. We extend the simple multinomial mixture model to a constrained multinomial mixture model by incorporating constraints on the information content profiles or on specific parameters of the motif PWMs. The parameters of this extended model are estimated by maximum likelihood using a nonlinear constraint optimization method. Likelihood-based cross-validation is used to select model parameters such as motif width and constraint type. The performance of COMODE is compared with existing motif detection methods on simulated data that incorporate real motif examples from Saccharomyces cerevisiae. The proposed method is especially effective when the motif of interest appears as a weak signal in the data. Some of the transcription factor binding data of Lee et al. (2002) were also analyzed using COMODE and biologically verified sites were identified.
منابع مشابه
Molecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds
The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملCluster-Buster: finding dense clusters of motifs in DNA sequences
The signals that determine activation and repression of specific genes in response to appropriate stimuli are one of the most important, but least understood, types of information encoded in genomic DNA. The nucleotide sequence patterns, or motifs, preferentially bound by various transcription factors have been collected in databases. However, these motifs appear to be individually too short an...
متن کاملiMoMi (interactive Motif Mining) - a database and utilities to assist the discovery of new regulatory patterns
Detection of DNA binding motifs for regulatory proteins allow to assign with a good reliability the role of each regulator in the cellular metabolism. With the increasing amount of complete genome sequences and the use of transcriptome analysis methods, bioinformatics approaches should contribute to detect most potential regulatory motifs that biologists will be able to confirm by biochemistry ...
متن کاملCapturing characteristic structural features for motif detection using a hierarchical Bayesian Markovian model
The detection of novel DNA motifs in the regulatory regions of genes provides important information regarding the organization and functional mechanisms of gene regulation. A wealth of biological evidence suggest that biological motifs are more than just conserved nucleotide strings but may have common structural properties underlying the seemingly diverse consensus sequences, such as character...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistical applications in genetics and molecular biology
دوره 2 شماره
صفحات -
تاریخ انتشار 2003